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Background / Context: 

Description of prior research and its intellectual context. 

Propensity score analysis (PSA) is a methodological technique which may correct for selection 
bias in a quasi-experiment by modeling the selection process using observed covariates. Because 
logistic regression is well-understood by researchers in a variety of fields and easy to implement 
in a number of popular software packages it has traditionally been the most frequently used 
method for modeling selection in PSA. The dependence on a single method is not for a lack of 
alternatives; any method that relates a binary outcome to multiple predictors is appropriate for 
modeling selection. Rather, there is a perception among practitioners and methodologists that the 
extant research on alternatives to logistic regression has not yet made a strong enough case for 
considering a different method (Stuart, 2010; Steiner & Cook, in press). 

There are, however, circumstances under which logistic regression may not perform well. 
If the response surface is not a hyperplane, the logistic regression selection model will require 
more than just linear terms in order to capture nonlinear relationships. Although polynomial and 
interaction terms may be included in the logistic model in order to better approximate a nonlinear 
selection process, when there are many covariates the number of terms to consider can be 
overwhelmingly large. In addition, when the ratio of the number of covariates to the sample size 
is high, the estimates produced by logistic regression will be unstable. Data mining methods such 
as the neural network (NN; Ripley, 1996) and the support vector machine (SVM; Cortes and 
Vapnik, 1995) are potentially useful in such situations because they are designed to deal with 
high-dimensional data and they automatically detect and model nonlinearities in the selection 
surface, thus avoiding the need for iterative model respecification. 

Setoguchi, Schneeweiss, Brookhart, Glynn, & Cook (2008) compared the performance of 
neural networks and main-effects only logistic regression in a simulation study that included ten 
covariates. They found that neural networks outperformed logistic regression in terms of percent 
bias reduction in some scenarios, including those in which the selection model was most 
nonlinear and nonadditive. To our best knowledge, the performance of neural networks for the 
estimation of propensity scores has not been compared with logistic regression in any other 
empirical investigation. In a review of potential alternatives to logistic regression for PS 
estimation, Westreich, Lessler, & Funk (2010) noted that the SVM is promising because it is 
well-suited to classification problems with high-dimensional data and does not require 
specification of a parametric model. Ratkovic (2012) adapted the SVM classifier to carry out 
case matching directly to estimate the average treatment effect on the treated in a nonequivalent 
comparison group setting. However, the performance of the SVM has not been examined (either 
in a simulation or case study) in the context of propensity score estimation. 

Purpose / Objective / Research Question / Focus of Study: 

Description of the focus of the research. 
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Simulation Study 1: A simulation study explores the effeet of PS estimation method on (a) the 
mean square error and standard error of the treatment effeet estimates and (b) eovariate balanee 
after eonditioning on the estimated propensity seores via optimal full matehing. Design factors in 
the study include the PS estimation method (logistic regression, the neural network, and the 
SVM), the data-generating selection model (linear and additive vs. nonlinear and non-additive), 
the data-generating outcome model (linear and additive vs. nonlinear and non- additive), and the 
number of covariates. The purpose of the simulation study is to examine the performance of the 
data mining methods relative to logistic regression under the different data generation scenarios 
and for varying numbers of covariates. 

Simulation Study 2; Both the NN and the SVM require the specification of tuning parameters 
in order to be implemented. When models are used for prediction, the optimal tuning parameters 
should be selected by running an extensive grid search and selecting the parameters which 
minimize the cross-validated prediction error. In the context of using data mining techniques to 
estimate propensity scores, however, prediction is not the ultimate goal. McCaffrey, Ridgeway, 

& Morral (2004) used generalized boosted modeling for PS estimation and recommended 
maximizing the balance instead of minimizing the prediction error. We develop a cross- 
validation procedure in R which maximizes the balance as measured by the average absolute 
standardized mean difference on first and second order terms and evaluate its performance 
relative to minimizing prediction error via simulation. 

Significance / Novelty of study: 

Description of what is missing in previous work and the contribution the study makes. 

Since Rosenbaum and Rubin (1983), logistic regression has been the traditional choice for PS 
estimation. The most important disadvantage of a propensity score estimation approach that uses 
logistic regression is the need for iterative specification of the model, which can be rather time 
intensive and comes with no guarantee of success, in particular with many covariates. A careful 
review of the burgeoning PS estimation literature has shown that the neural network and the 
SVM are promising alternatives to logistic regression which avoid the need for respecification 
because they automatically model nonlinearities in the selection response surface, and are well 
suited for high-dimensional data. These two methods, although promising, are heretofore largely 
or completely empirically untested in this context. 

Through simulation, we examine the conditions under which logistic regression is 
relatively robust to model misspecification and the conditions under which the neural network or 
the support vector machine will provide a less biased estimate of the effect of a treatment. We 
also evaluate through simulation and make available a program written in R which carries out a 
cross-validated grid search for the optimal tuning parameters for the data mining methods based 
on maximizing the balance as opposed to minimizing the prediction error. 

Statistical, Measurement, or Econometric Model: 

Description of the proposed new methods or novel applications of existing methods. 
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Here we deseribe the data generation models for simulation study 1 with 12 covariates of size n 
= 2000. Let X — (X-i,X 2 , ... ,Xi 2 ) represent the 2000 x 12 matrix of covariates. The Xj are 
independently generated from a standard normal distribution and correlations are introduced by 
Cholesky decomposition such that p(XjXy) = 0.3 for all i ^ j. Let Z represent the dichotomous 
2000 X 1 treatment assignment vector and Y the 2000 x 1 continuous outcome vector. Then 
e(X) = P(Z = 1|X) is the propensity score vector. The data-generating propensity score model 
are as follows. 

Scenario Aps - propensity score model is linear and additive (all main effects only): 

e(X) = (1 + exp{-(;go + PiXi + P 2 X 2 + - + Pi2Xi2)}y ~ 1 


Scenario Bps - propensity score model is nonlinear and nonadditive (all main effects, five two- 
way interactions, and four quadratic terms): 

e(X) = 

( / ^0 + + ^2^2 f Pl2^12 + \ 

(1 + exp j — I + Pl4^2^12 + + Pl6^4^12 + + j j’)'' ~ 1 

i \ Pl8^2 + + ^20^8 + ^21^11 + ^ / j 

The data-generating outcome model are as follows. 

Scenario Aqc - outcome model is linear and additive (all main effects only): 

Y = Uq + a-iX-i -f ( 22 X 2 + — h < 212 X 12 + f 

Scenario Bqc - outcome model is nonlinear and nonadditive (all main effects, five two-way 
interactions, and four quadratic terms): 

/ <2g + (2iXi + <22X2 + — h <212X12 + \ 

Y = I <^13-^l-^12 T '^14-^2-^12 T '^15-^2-^10 T '^16-^4-^12 T <^17-^l-^8 T j 

\ <^18-^2 "f '^19-^5 "f '^20-^8 "f < 2 X 11 + e J 

Here we give a brief overview of the data mining methods used to estimate propensity scores in 
the simulation studies. The term neural network refers to a class of models inspired by theories 
about how the human brain uses neurons to send messages. The back-propagation neural 
network is made up of an input layer, which consists of the observed covariates, an output layer, 
which consists of one unit for a dichotomous classification problem, and one hidden layer of 
unobserved variables. The NN may be thought of as a nonlinear extension of logistic regression. 
In particular, the NN with no hidden layer and one dichotomous output is equivalent to logistic 
regression. The addition of a hidden layer involves a weighted transformation of the data via an 
activation function, usually chosen to be the logistic function, f{x) = (1 -f exp(— x))“^; the 
hidden layer with non-linear activation function is what affords the NN its added flexibility. 
Weight matrices Z and W contain the connection weights between the covariates and the hidden 
layer and the hidden layer and the output, respectively, and are iteratively estimated through 
forward and backward passes through the network until a stopping criteria is met. 

The support vector classifier is a technique for classifying training data by constructing a 
maximally separating hyperplane in the covariate space. If the data are quite noisy, it may not be 
possible to construct a hyperplane that perfectly separates the points. This problem is dealt with 
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by defining slack variables which are proportional to how far from the boundary the point is on 
the wrong side. The margin is maximized subject to the constraint that the sum of the slack 
variables be less than or equal to a constant. Thus, points near the boundary play a bigger role in 
shaping it and points which are correctly classified play no role in shaping the boundary. The 
support vector machine is a nonlinear extension of the support vector classifier which maps the 
covariates into a high dimensional space via a basis function (often the radial basis function, 

K{Xi, X 2 ) = exp y| 1^1 — X 2 1 1^^). The support vector classifier is then run in the transformed 

space. Linear boundaries in the transformed space translate to non-linear boundaries in the 
original space. 

Usefulness / Applicability of Method: 

Demonstration of the usefulness of the proposed methods using hypothetical or real data. 

The results of the simulation study clearly demonstrate that the misspecitication of the PS model 
via logistic regression leads to the potential for gross bias in the estimate of the treatment effect 
when there are nonlinear or nonadditive confounders. This can be seen in Table 1 in the BB 
condition. The absolute percent bias for the misspecilied logistic regression model for that 
condition is 145%, compared with 6% for the NN and 28% for the SVM. An examination of the 
average standardized absolute mean difference reveals that although logistic regression was able 
to attain better balance on the linear confounders, it failed to balance the higher order terms. The 
NN and the SVM are fully automated algorithmic approaches which were able to achieve better 
overall balance which resulted in substantially better estimates of the treatment effect in this 
condition as measured by both the absolute bias and the mean-square error. 

Conclusions: 

Description of conclusions, recommendations, and limitations based on findings. 

Results of the simulation study demonstrate that when there are higher order confounders (i.e., 
nonlinear terms present in both the selection and the outcome model), misspecitication of the 
logistic PS estimation model can result in a very biased estimate of the treatment effect. The data 
mining techniques were less biased and had smaller mean square error in that case. The 
simulation study further explores the effect of the number of covariates and the number and 
strength of higher order confounders on the performance of the PS estimation methods. We 
develop and assess the use of a cross-validation procedure to choose tuning parameters based on 
maximizing balance as opposed to minimizing prediction error. Finally, we use the results of the 
simulations to inform the reanalysis of several educational data sets which used (a) PSA with 
PSs estimated by logistic regression or (b) a multiple regression approach. In particular, we 
check for robustness of the treatment effect estimate across the different PS estimation methods 
and use balance (or lack thereof) on higher order terms as an indicator of the appropriateness of 
PS estimation method. 

Given the widespread use of propensity score methods in education and across a variety 
of other substantive areas, the potential impact of improved estimates of treatment effect based 
on appropriate selection of PS estimation techniques is enormous. It is our hope that the 
recommendations based on the simulation study results will help to guide researchers to make 
informed decisions about which propensity score estimation technique to use for their given 
situation in order to maximize the accuracy and efficiency of research. 
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Appendix B, Tables and Figures 

Not included in page count. 


Table 1. Simulation results averaged over 1000 replieations for the ease with p=\2 eovariates 
and A= 2000 subjects. The average treatment effect was estimated by optimal full matching in 
each case. 


Metric 



Scenario^ 



Method* 

ApsAoc 

ApsBoc 

BpsAoc 

BpsBoc 

Absolute bias 

LR 

0.11 

1.46 

12.83 

145.30 

(per cent) 

NN 

2.43 

3.81 

3.07 

5.57 


SVM 

6.61 

7.44 

6.82 

28.16 

Mean-square 

LR 

0.63 

1.26 

0.63 

34.57 

error x 10''2 

NN 

0.77 

1.27 

0.75 

1.16 


SVM 

0.89 

1.52 

0.98 

2.53 

Standard 

LR 

0.25 

0.35 

0.19 

0.28 

error x 10''2 

NN 

0.28 

0.35 

0.27 

0.33 


SVM 

0.29 

0.38 

0.30 

0.36 

ASAMD^: 

LR 

0.040 

0.039 

0.030 

0.030 

main effects 

NN 

0.047 

0.046 

0.048 

0.047 


SVM 

0.050 

0.060 

0.062 

0.061 

ASAMD; 

LR 

0.047 

0.047 

0.125 

0.125 

squared terms 

NN 

0.049 

0.050 

0.045 

0.046 


SVM 

0.047 

0.052 

0.059 

0.060 

ASAMD; 

LR 

0.065 

0.066 

0.135 

0.134 

two-way 

NN 

0.062 

0.063 

0.060 

0.060 

interactions 

SVM 

0.064 

0.074 

0.071 

0.071 


*LR: logistic regression, NN; neural network, SVM: support vector machine. 

^ Aps = data generating PS model is linear and additive; Bps = data generating PS model is 
nonlinear and nonadditive; Aqc = data-generating outcome model is linear and additive; Bqc = 
data-generating outcome model is nonlinear and nonadditive; 

^ASAMD; average standardized absolute mean difference of the eovariates after PS matching. 
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